42 research outputs found

    A knowledge graph embeddings based approach for author name disambiguation using literals

    Get PDF
    Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

    Knowledge Extraction for Art History: the Case of Vasari’s The Lives of The Artists (1568)

    Get PDF
    Knowledge Extraction (KE) techniques are used to convert unstructured information present in texts to Knowledge Graphs (KGs) which can be queried and explored. Despite their potential for cultural heritage domains, such as Art History, these techniques often encounter limitations if applied to domain-specific data. In this paper we present the main challenges that KE has to face on art-historical texts, by using as case study Giorgio Vasari's The Lives of The Artists. This paper discusses the following NLP tasks for art-historical texts, namely entity recognition and linking, coreference resolution, time extraction, motif extraction and artwork extraction. Several strategies to annotate art-historical data for these tasks and evaluate NLP models are also proposed

    Knowledge Extraction for Art History: the Case of Vasari’s The Lives of The Artists (1568)

    Get PDF
    Knowledge Extraction (KE) techniques are used to convert unstructured information present in texts to Knowledge Graphs (KGs) which can be queried and explored. Despite their potential for cultural heritage domains, such as Art History, these techniques often encounter limitations if applied to domain-specific data. In this paper we present the main challenges that KE has to face on art-historical texts, by using as case study Giorgio Vasari’s The Lives of The Artists. This paper discusses the following NLP tasks for art-historical texts, namely entity recognition and linking, coreference resolution, time extraction, motif extraction and artwork extraction. Several strategies to annotate art-historical data for these tasks and evaluate NLP models are also proposed

    Multimodal Search on Iconclass using Vision-Language Pre-Trained Models

    Full text link
    Terminology sources, such as controlled vocabularies, thesauri and classification systems, play a key role in digitizing cultural heritage. However, Information Retrieval (IR) systems that allow to query and explore these lexical resources often lack an adequate representation of the semantics behind the user's search, which can be conveyed through multiple expression modalities (e.g., images, keywords or textual descriptions). This paper presents the implementation of a new search engine for one of the most widely used iconography classification system, Iconclass. The novelty of this system is the use of a pre-trained vision-language model, namely CLIP, to retrieve and explore Iconclass concepts using visual or textual queries

    A knowledge graph embeddings based approach for author name disambiguation using literals

    Get PDF
    Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively

    Identifying and correcting invalid citations due to DOI errors in Crossref data

    Get PDF
    This work aims to identify classes of DOI mistakes by analysing the open bibliographic metadata available in Crossref, highlighting which publishers were responsible for such mistakes and how many of these incorrect DOIs could be corrected through automatic processes. By using a list of invalid cited DOIs gathered by OpenCitations while processing the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI) in the past two years, we retrieved the citations in the January 2021 Crossref dump to such invalid DOIs. We processed these citations by keeping track of their validity and the publishers responsible for uploading the related citation data in Crossref. Finally, we identified patterns of factual errors in the invalid DOIs and the regular expressions needed to catch and correct them. The outcomes of this research show that only a few publishers were responsible for and/or affected by the majority of invalid citations. We extended the taxonomy of DOI name errors proposed in past studies and defined more elaborated regular expressions that can clean a higher number of mistakes in invalid DOIs than prior approaches. The data gathered in our study can enable investigating possible reasons for DOI mistakes from a qualitative point of view, helping publishers identify the problems underlying their production of invalid citation data. Also, the DOI cleaning mechanism we present could be integrated into the existing process (e.g. in COCI) to add citations by automatically correcting a wrong DOI. This study was run strictly following Open Science principles, and, as such, our research outcomes are fully reproducible

    Identifying and correcting invalid citations due to DOI errors in Crossref data

    Get PDF
    This work aims to identify classes of DOI mistakes by analysing the open bibliographic metadata available in Crossref, highlighting which publishers were responsible for such mistakes and how many of these incorrect DOIs could be corrected through automatic processes. By using a list of invalid cited DOIs gathered by OpenCitations while processing the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI) in the past two years, we retrieved the citations in the January 2021 Crossref dump to such invalid DOIs. We processed these citations by keeping track of their validity and the publishers responsible for uploading the related citation data in Crossref. Finally, we identified patterns of factual errors in the invalid DOIs and the regular expressions needed to catch and correct them. The outcomes of this research show that only a few publishers were responsible for and/or affected by the majority of invalid citations. We extended the taxonomy of DOI name errors proposed in past studies and defined more elaborated regular expressions that can clean a higher number of mistakes in invalid DOIs than prior approaches. The data gathered in our study can enable investigating possible reasons for DOI mistakes from a qualitative point of view, helping publishers identify the problems underlying their production of invalid citation data. Also, the DOI cleaning mechanism we present could be integrated into the existing process (e.g. in COCI) to add citations by automatically correcting a wrong DOI. This study was run strictly following Open Science principles, and, as such, our research outcomes are fully reproducible

    Radiation Therapy for Non-Small Cell Lung Cancer in the Twenty-First Century

    Get PDF
    Lung cancer is the biggest oncologic problem for global health, as it is the most deadly and prevalent pathology after skin cancer. Two million patients are diagnosed every year, and around 80% of them die due to the disease. Radiotherapy has been practiced for decades to treat these patients, but recently, there has been important advances on this treatment on early stages (I and II), as stereotactic radiation therapy is becoming crucial. There has also been an increase on the importance of this treatment on more advanced stages (III), since intensity-modulated radiation therapy has achieved the reduction of undesirable side effects. The performance of stereotactic radiation at metastasis stages on patients with oligometastasis has accomplished great results. Likewise, hypofractionated treatments on polymetastatic patients have increased their quality of life

    Identifying and correcting invalid citations due to DOI errors in Crossref data

    Get PDF
    This work aims to identify classes of DOI mistakes by analysing the open bibliographic metadata available in Crossref, highlighting which publishers were responsible for such mistakes and how many of these incorrect DOIs could be corrected through automatic processes. By using a list of invalid cited DOIs gathered by OpenCitations while processing the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI) in the past two years, we retrieved the citations in the January 2021 Crossref dump to such invalid DOIs. We processed these citations by keeping track of their validity and the publishers responsible for uploading the related citation data in Crossref. Finally, we identified patterns of factual errors in the invalid DOIs and the regular expressions needed to catch and correct them. The outcomes of this research show that only a few publishers were responsible for and/or affected by the majority of invalid citations. We extended the taxonomy of DOI name errors proposed in past studies and defined more elaborated regular expressions that can clean a higher number of mistakes in invalid DOIs than prior approaches. The data gathered in our study can enable investigating possible reasons for DOI mistakes from a qualitative point of view, helping publishers identify the problems underlying their production of invalid citation data. Also, the DOI cleaning mechanism we present could be integrated into the existing process (e.g. in COCI) to add citations by automatically correcting a wrong DOI. This study was run strictly following Open Science principles, and, as such, our research outcomes are fully reproducible

    Good survival outcome of metastatic SDH-deficient gastrointestinal stromal tumors harboring SDHA mutations

    Get PDF
    Purpose:A subset of patients with KIT/PDGFRA wild-type gastrointestinal stromal tumors show loss of function of succinate dehydrogenase, mostly due to germ-line mutations of succinate dehydrogenase subunits, with a predominance of succinate dehydrogenase subunit A. The clinical outcome of these patients seems favorable, as reported in small series in which patients were individually described. This work evaluates a retrospective survival analysis of a series of patients with metastatic KIT/PDGFRA wild-type succinate dehydrogenase-deficient gastrointestinal stromal tumors.Methods:Sixty-nine patients with metastatic gastrointestinal stromal tumors were included in the study (11 KIT/PDGFRA wild-type, of whom 6 were succinate dehydrogenase deficient, 5 were non-succinate dehydrogenase deficient, and 58 were KIT/PDGFRA mutant). All six succinate dehydrogenase-deficient patients harbored SDHA mutations. Kaplan-Meier curves and log-rank tests were used to compare the survival of patients with succinate dehydrogenase subunit A-mutant gastrointestinal stromal tumors with that of KIT/PDGFRA wild-type patients without succinate dehydrogenase deficiency and patients with KIT/PDGFRA-mutant gastrointestinal stromal tumors.Results:Follow-up ranged from 8.5 to 200.7 months. The difference between succinate dehydrogenase subunit A-mutant gastrointestinal stromal tumors and KIT/PDGFRA-mutant or KIT/PDGFRA wild-type non-succinate dehydrogenase deficient gastrointestinal stromal tumors was significant considering different analyses (P = 0.007 and P = 0.033, respectively, from diagnosis of gastrointestinal stromal tumor for the whole study population; P = 0.005 and P = 0.018, respectively, from diagnosis of metastatic disease for the whole study population; P = 0.007 for only patients who were metastatic at diagnosis).Conclusion:Patients with metastatic KIT/PDGFRA wild-type succinate dehydrogenase-deficient gastrointestinal stromal tumors harboring succinate dehydrogenase subunit A mutations present an impressively long survival. These patients should be identified in clinical practice to better tailor treatments and follow-up over time A subset of patients with KIT/PDGFRA wild-type gastrointestinal stromal tumors show loss of function of succinate dehydrogenase, mostly due to germ-line mutations of succinate dehydrogenase subunits, with a predominance of succinate dehydrogenase subunit A. The clinical outcome of these patients seems favorable, as reported in small series in which patients were individually described. This work evaluates a retrospective survival analysis of a series of patients with metastatic KIT/PDGFRA wild-type succinate dehydrogenase-deficient gastrointestinal stromal tumors.Methods:Sixty-nine patients with metastatic gastrointestinal stromal tumors were included in the study (11 KIT/PDGFRA wild-type, of whom 6 were succinate dehydrogenase deficient, 5 were non-succinate dehydrogenase deficient, and 58 were KIT/PDGFRA mutant). All six succinate dehydrogenase-deficient patients harbored SDHA mutations. Kaplan-Meier curves and log-rank tests were used to compare the survival of patients with succinate dehydrogenase subunit A-mutant gastrointestinal stromal tumors with that of KIT/PDGFRA wild-type patients without succinate dehydrogenase deficiency and patients with KIT/PDGFRA-mutant gastrointestinal stromal tumors.Results:Follow-up ranged from 8.5 to 200.7 months. The difference between succinate dehydrogenase subunit A-mutant gastrointestinal stromal tumors and KIT/PDGFRA-mutant or KIT/PDGFRA wild-type non-succinate dehydrogenase deficient gastrointestinal stromal tumors was significant considering different analyses (P = 0.007 and P = 0.033, respectively, from diagnosis of gastrointestinal stromal tumor for the whole study population; P = 0.005 and P = 0.018, respectively, from diagnosis of metastatic disease for the whole study population; P = 0.007 for only patients who were metastatic at diagnosis).Conclusion:Patients with metastatic KIT/PDGFRA wild-type succinate dehydrogenase-deficient gastrointestinal stromal tumors harboring succinate dehydrogenase subunit A mutations present an impressively long survival. These patients should be identified in clinical practice to better tailor treatments and follow-up over time
    corecore